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Minimax Robust Hypothesis Testing 

Gokhan Gul, Student Member, IEEE, Abdelhak M. Zoubir, Fellow, EEE, 


Abstract —The minimax robust hypothesis testing problem for 
the case where the nominal probability distributions are subject 
to both modeling errors and outliers is studied in twofold. First, 
a robust hypothesis testing scheme based on a relative entropy 
distance is designed. This approach provides robustness with 
respect to modeling errors and is a generalization of a previous 
work proposed by Levy. Then, it is shown that this scheme 
can be combined with Huber’s robust test through a composite 
uncertainty class, for which the existence of a saddle value 
condition is also proven. The composite version of the robust 
hypothesis testing scheme as well as the individual robust tests 
are extended to fixed sample size and sequential probability 
ratio tests. The composite model is shown to extend to robust 
estimation problems as well. Simulation results are provided to 
validate the proposed assertions. 

Index Terms —Detection, hypothesis testing, robustness, least 
favorable distributions, minimax optimization, sequential proba¬ 
bility ratio test. 

I. Introduction 

The detection of the presence, or absence of an event with 
a specified accuracy is fundamental to statistical inference and 
binary hypothesis testing is the usual starting point. There are 
many applications, where binary hypothesis testing is used, for 
instance, radar, sonar, digital communications or seismology. 
A natural extension of binary hypothesis testing is multiple hy¬ 
pothesis testing, which builds a basis for classification and its 
importance is evident, for example, with pattern recognition. 
The necessity for statistical inference lies in the randomness 
that is inherent in the natural world such that received data, or 
signal, has an additive random component or, as in cognitive 
radio, must be modeled in a purely random manner. The degree 
of randomness in the received data usually turns out to be a 
metric of detection accuracy m. 

Formally, any real world example of binary decision making 
problem can be modeled by a binary hypothesis test, where 
under each hypothesis Hj, a received data y £ R follows a 
particular probability distribution Fj, j £ {0,1}. Accordingly, 
the aim is to find a decision rule 5 which assigns each y either 
to Ho or Hi, depending on a certain objective function, which 
can be, for instance, the error probability. An optimal decision 
rule 6 minimizes the objective function if y indeed follows Fj 
under Hj, j £ {0,1}. However, this condition is too strict and 
often there are deviations from the model assumptions d. 

A traditional way of considering the deviations from the nom¬ 
inal distributions is via parametric modeling. Such parameters 
could be, for instance, the imprecisely known frequency of a 
receive signal or the unknown variance of a noise source. The 
shape of the probability distributions under each hypothesis 
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is still assumed to be completely known. However, this as¬ 
sumption is invalid for various applications, for instance sonar, 
or cognitive radio. Obviously, in such cases, a parametric 
model is inappropriate, or if such a model is used, then severe 
performance degradation results. 

The shortcomings of parametric modeling necessitate the use 
of non-parametric approaches. Such approaches are robust, are 
cheap to implement in practice, make (almost) no assump¬ 
tion on the nominal distributions and their performance is 
acceptable for a variety of detection problems [|3]| . However, 
compared to an optimum detector, their performance can be 
far away from being satisfactory, especially if there is some 
a priori knowledge available about the nominal distributions. 
Therefore, a more realistic approach should be tunable, de¬ 
pending on how much knowledge is available on the nominal 
distributions, how much robustness/performance trade-off is 
allowed as well as how complex the detector structure can 
be. In this context, robust minimax hypothesis testing falls 
between parametric and non-parametric detection; it coincides 
with parametric detection when the robustness parameters are 
chosen to be zero and it tends to a non-parametric test, a 
sign test , when the robustness parameters are chosen to be at 
maximum 0 P- 271], 

A well known formulation of minimax hypothesis testing is 
based on building uncertainty sets Fj for each hypothesis Hj, 
where Fj are populated by all probability distributions Gj, 
which are at least ej close to the nominal distribution Fj with 
respect to some well defined distance D v j £ {0,1}. The 
choice of the parameters eo and e\ determines the degree of 
robustness and they can vary with application. The eventual 
aim of the designer is to determine a pair of distributions 
(Go,Gi) £ Fo x F-\ , and a decision rule 5, such that a 
predefined performance measure is met, e.g. the bounded 
error probability. This type of optimization is called minimax 
optimization and the distributions solving this problem are 
called least favorable distributions (LFD)s. 

In this research field, there are two main approaches: one of 
which was initiated by Huber a and the other by Dabak 
et al. 0| and Levy (7J. In Huber’s work, which was pub¬ 
lished as early as 1965, he proposed a robust version of the 
probability ratio test for e—contamination and total variation 
classes of distributions. He proved the existence of LFDs for 
both classes and showed that the resulting robust test was a 
censored/clipped version of the nominal likelihood ratio test. 
In a follow up work, he showed that the same conclusion 
could be made if the e—contamination model was extended 
to a larger class, which included five different distances as 
special cases ID. A more general uncertainty class, called 
2-alternating capacities, was proposed later by Huber and 
Strassen 0. However, it was noted in El that the approach 
in na is more suitable for engineering applications due to its 
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simplicity. In a recent work, an uncertainty class which allows 
the use of composite distances for robust hypothesis testing has 
been proposed ED- 

The robust tests pioneered by Huber were designed for mod¬ 
eling outliers. More recent works by Dabak (6|, and later by 
Levy 0, show that when the distance D is chosen to be 
the relative entropy, the resulting robust tests are different 
from Huber’s robust test, depending on the choice of the 
objective function to be minimized. While Dabak’s approach 
minimizes the relative entropy between the LFDs and provides 
an asymptotically robust test. Levy’s robust test minimizes the 
type I and type II errors and provides a minimax robust test 
for a single sample. In 0, it was noted that the latter two 
robust tests are more appropriate for modeling errors instead 
of outliers. Recently, it has been shown that Levy’s robust test 
can be extended to distributed detection problems where the 
communication from sensors to the fusion center is constrained 
m. It has also been shown that considering the squared 
Hellinger distance instead of the relative entropy might provide 
a more flexible design tm, ma¬ 
in this paper, a robust hypothesis testing scheme based on 
Kullback-Leibler (KL) divergence is proposed. The problem 
formulation doesn’t make any assumption about the choice 
of nominal distributions and, thus, it includes 0 as a special 
case. This robust scheme is then extended by use of a compos¬ 
ite uncertainty set, which is built with respect to two different 
distances. The first distance models the misassumptions on 
the nominal distributions and the second distance models the 
outliers. It is proven that LFDs for this composite model 
exist and therefore a single test can be robust with respect 
to both modeling errors as well as outliers. Notice that this 
composite class is different from the one proposed in oa. 
Finally, the designed robust tests are extended to fixed sample 
size and sequential probability ratio tests. It is also shown 
that the composite model can be extended to robust estimation 
problems. 

The organization of this paper is as follows. In the following 
section, the LFDs and the robust decision rule are derived 
when the uncertainty sets are closed balls with respect to the 
KL divergence. The uniqueness and monotonicity properties 
of the LFDs are further proven. It is shown that the proposed 
model reduces to the model given in 0, when the nominal 
distributions are symmetric and the nominal likelihood ratio is 
monotone. For comparison reasons, the asymptotically robust 
test Q is presented and the existence of LFDs is proven with¬ 
out considering the geometrical aspects of hypothesis testing. 
The implications of considering other distances to obtain the 
LFDs and the robust decision rules are also discussed. In 
Sec. m the composite uncertainty set, which models both 
the outliers as well as the modeling errors, is introduced. 
When this model reduces to single robust tests, the density 
function of the log likelihood ratios is derived for performance 
evaluation as well as for asymptotic analysis. Similarly, the 
equations from which one can uniquely determine the max¬ 
imum of the robustness parameters, above which a minimax 


robust test cannot be designed, are also derived. In Sec. IV 


the robust methods are extended to fixed sample size tests. 
Especially, it is shown whether the robust tests maintain their 


LFD properties. The section is concluded with obtaining the 
limiting tests and the formulation of asymptotic analysis. In 
Sec. [V] the sequential probability ratio test is robustified via 
replacing the nominal likelihood ratios by robust ones. It is 
investigated whether the LFD properties are preserved in gen¬ 
eral as well as asymptotically for both robustified sequential 
tests. In Sec. [VI] an extension of the composite uncertainty 
model for the design of robust estimation problems is briefly 
introduced. In Sec. VII simulation results are presented and 
finally in Sec. VIII the paper is concluded. 


II. Robust detection for modeling errors 

Let (fbe a measurable space with the probability 
measures F f>1 Fj, Go and G± defined on it, which are ab¬ 
solutely continuous with respect to a dominating measure p, 
e.g. /z = F 0 + F± + G 0 + G\. Furthermore, let / 0 , /i, go 
and gi be the density functions of the probability measures 
Fq, F\ , Go and Gi with respect to /z, respectively. Define the 
uncertainty classes 

G.i = {dj ■ D(gjjj ) < £j} j e {0,1}, (1) 

where every gj is at least £j > 0 close to the nominal density 
fj, with respect to the KL-divergence i.e. 

D (9j,fj)-=[H9j/fj)9j^, i 6 (0, !}• (2) 

JR 

Now, consider the composite hypothesis testing problem 
Uo'.Y-Go 

U 1 -.Y~G 1 (3) 

where Y is a real-valued random variable (r.v.) on O. Define a 
randomized decision rule (function) 8 £ A, where A stands for 
the set of all possible decision rules. Assume for the moment 
that £q = £i = 0. Then, the decision rule 

f 0, l{y) < p 

<%) = s n(y), Kv) = p ( 4 ) 

[l, i(y)>p 

for some threshold p = P(Ho)/P(7ii) and a function k : 
ffi. —► [0,1], given the likelihood ratio l(y) := fi/fo{y ), is 
optimum in the sense that it minimizes the error probability 
both in the Bayes and the Neyman-Pearson sense and results 
in two types of errors: the false alarm probability 

Pe(SJo) = f Sfodp (5) 

Jr 

and the miss detection probability 

p e( s Ji) = [ (l-£)/id/z- (6) 

Jr 

Accordingly, the minimum error probability is given by 

Pe(S, So , fi) = P{Ho)P%{6, So) + P{U{)Pl{6, h). (7) 

Remark ILL The sets Qo and Q\ are not compact in the 
topology induced by the distance D. However, since D is 
a convex function, Qo and Q\ are convex sets. As a result 
Go x Gi is also convex. Given the a priori probabilities 
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P(H o) and P(Jif), the probability of error P E { 8 , /o,/i) is 
continuous, real-valued and linear, and therefore both convex 
and concave in all three terms 8 , fo, fi- In general, the space 
of all randomized decision rules A = C°(K, [0,1]) is not 
compact. The compactness condition, however, is not required 
because the error minimizing decision rules are known to exist 
and to be the likelihood ratio test for all (ffo,ffi) £ Go x Q i- 
Let <5i and 82 be two decision functions chosen from A. Then, 
simply for 8 = a 8 \ + (1 — 0 ) 82 , 0 < a < 1, we have 8 G A 
and therefore A is convex. Note that any finitely supported 
quantization of go and g\ makes both Go x G i and A compact 
with respect to the standard topology. This is a straightforward 
result of Heine-Borel theorem El- 

Remark. [ITT] indicates that Sion’s minimax theorem 03 is 
applicable. 


which is equivalent to the robust likelihood ratio 

i(y) = (15) 

form the saddle value condition for Eq. ©• Furthermore the 
parameters f and l u can be determined by solving 


-In (z(h,l u )) 


, lu 


Zi ln(/z) [ fodp 
Ji<h 


hi<i<i-u 


(ifH) H'u/h) In / • (| If 1 l) fcdp 


~\~k{li,l u )l u \Tv{k{li,l u )l u ) I fodp 
Ji>i u 


= eo 


(16) 


and 


sup min P E {8, g 0 , g 1 ) = min sup P E (8,g 0 ,gi). 

(9o,3i)60ox5i (9o,9i)66oXgi 

( 8 ) 

Hence, P E (8 , go, gi) possesses a saddle-value on Ax (Go xGi) 
with the least favorable densities (ffoiffi) £ Go x Gi an< ! the 
robust decision rule 8 £ A, i.e., {<5, (ffcnffi)}. resulting from 
Eq. Consequently 

P E {8,g 0 ,gi ) > P E (S,g 0 ,gi) > P E (S,g 0 ,gi). (9) 


-In (z(h,l u )) 


z(h,lu) 


tli<Klu 


/ ln(fc(i,,l„)) 

(ifH) In (lf x l) 1 G‘u/h) 


+k(li,l u )ln(k(li,l u )) / fidg 
Jl>i u 


= £i- 


fidp 

(17) 


Since P E is distinct in go and g\, it follows that 

P E {8,go) < P E (8,9o), 
Ph(8, gi )<Ph(8,gi). (10) 


Theorem II. 1. Let li and l u be two real numbers with 0 < 
li < 1 < l u < oo. Then, for 


z(lul u ) / I 

Jl<li Jl t <Kl u 

+k(hju) / fidfi. 
Ji>i u 


in (k(l t ,l u )) 

(If 11 ) Hlu/h) hdp 


( 11 ) 


and 


k{li, lu) 


fi<h (h l)fodp 
fi>i n (^ lu)fodg 


( 12 ) 


Proof: The solution of the minimax non-linear optimiza¬ 
tion problem 

max F%(6, gf), j £ {0,1} 

9j£yj 

s.t. gj > 0 

T(ffj) = [ 9]dp = 1, j £ {0,1} 

Jr 

min P e { 8, g 0 ,9i), (18) 

o£ A 

directly leads to the assertion. First, the maximization stage 
is solved by considering the Karush-Kuhn-Tucker (KKT) 
multipliers. The subsequent minimization and optimization 
stages complete the proof. 

A. Maximization stage 


the least favorable densities 


Consider the Lagrangian 


itfTEj/ofo). Kv) < h 

M = \ wk) Mv), h < Kv) < h 

hMhGAf o{ yf 


z(h,l u ) 

urn Mv), 


l{y) > lu 
Kv) < h 


9i(y) = 


z(hM- ^ ^ 

(iT 1 «y))'^&' Mv), h < Kv) < h 
itef Mv), 


l(y) > lu 


(13) 


and the decision rule 


fo, l(y)<h 

‘Sf h<Kv)<h 

(. I? i(y) ^ lu 


(14) 


/-'(.7,-V/o) = P E {&, 9j) + M e o - D (9j\fj)) 

+ ft(l-Tfe))), J£{0,1}- (19) 


where p 7 , A j are the KKT multipliers which are imposed to 
satisfy the constraints. Since If is concave in g 7 , a globally 
optimum solution is guaranteed if the necessary KKT condi¬ 
tions are met m. Writing ( p~9} explicitly for if), it follows 
that 


£°(ffo> Aq, po) — 


8go + Aoeo — Ao In ^j-go + Po ~ Moffo 
Jo 


dp 

( 20 ) 

Imposing the first KKT condition (stationarity), through taking 
Gateaux’s derivative of Eq. ( |20| in the direction of ip, yields 

[ [<5 - A 0 In y- - A 0 - p 0 \ipdp, (21) 

Jr JO 

























4 


which implies 


<f- 


Ao In 7 — Ao — po — 0 , 

Jo 


( 22 ) 


since ip is an arbitrary function. Hence, go, and in a similar 
way gi by solving ( fl9| for Pj,, can be obtained. The results 
are 

go = ci f 0 , gi = c 3 /i (23) 

where C\ = exp(— ), c 2 = exp(— ~ 1+ ^° +M ° ), c 3 = 
exp(— —L±Ai±/£i ) and C4 = exp(— Al ^ Ml )■ This leads to the 
robust likelihood ratio 


( = (|) e - s '"(S)i. 


(24) 


B. Minimization stage 


The decision rule 6 , which minimizes Pg for any (g 3 , gi) £ 
Go x G i, is known to be the likelihood ratio test ([4]). Solving 
l = 1 from Eq. ( |24| and rewriting Eq. <[4}i with p = 1 for l 
yields 


5 = 


0, 



1, 


l < 1 
1 = 1 
i > i 


(25) 


Applying ( |25] l to ( |23[ i, the least favorable distributions with 
respect to their density functions are obtained as 


li and l u . Let li = c\/c 3 and l u = c 2 /c 4 . Then, considering 
l = gi/go from ( |26| ), it follows that 

Ki ■= {y ■ l(y) < h} = {y ■ i(y) < !} 

P 2 := {y -h< l(y) < lu} = {y ■ l(y ) = 1} 

R-3 ■■= {y ■ l{y) > lu} = {y ■■ l{y) > 1 } 


Rewriting the integrals with the new limits (over 
(Pi, 72 . 3 )), using the substitutions ci := c 3 /; and 

c 2 := c,\l v , dividing both sides of the first two equations in 
© by c 3 , and equating them to each other via l/c 3 results 
in C 4 = k(li,l u )c 3 . Accordingly, it follows that 

ln(fc(lj,l u )) 

$ = c 3 (/ i - 1 0 ln(Wii) /i- ( 28 ) 


This allows the second equation in ( |27| > to be written as c 3 := 
1 /z(li, l u )- Now, all constants ci, c 2 , c 3 and C 4 as well as are 
parameterized by l[ and /„. Thus, Eq. m can be rewritten 

Finally, 


II. 1 


Eq. O and <5,5 o,5i as given in Theorem I 
the last two equations of © reduce to © and "©. This 
completes the proof. 


D. Monotonicity of the relative entropy 

In the sequel it is shown that ordering in likelihood ratios 
implies ordering in KL-divergence. This explains the mono¬ 
tonic behavior of LFDs for increasing robustness parameters 
given that l is monotone. The theory that will be presented 
will also be used in the next sections. 


f ci/o, l < 1 

ln(c 2 /c 1 ) 

9o= W ln (^t)/ 0 , 1 = 1' 9i 


C3/1, l < 1 

In(c 4 /c 3 ) 

CcT^/l, 1 = 1 


C2/0, l > 1 I.C4/1, l> 1 


where cq 


( In C2 In C3 — In ci In c 4 ) 
'I ^ ln(c 2 C 3 )-ln(ciC 4 ) )' 


(26) 

The unknown pa¬ 


rameters can be obtained by imposing the constraints, or 
equivalently by solving the non-linear equations 


Cl 


/(jd/J + / ‘f’d/i + c 2 / f 0 dp = 1 


/t<i 


11 =1 


/Z>1 


C 3 


f\ d/i + / < hd/r + c 4 / /id// = 1 


/ 2<1 


/ 2 — 1 


/ 2>1 

$ 


Ci In ci / /od/i+ / $ln— d// + c 2 lnc 2 / / 0 d/t = £ 0 


/ 2<1 


/ 2 — 1 


/o 

$ 


22>1 


c 3 In c 3 / /id//+ / $ In — d//+ C 4 lnc 4 / /id// = £1 


/ 2<1 


/2 —1 


h 


' 1>1 


(27) 


where $ = c 0 exp h- Note that the first 

two equations are required to make sure that f)o an d <?i are 
density functions, i.e., they integrate to one and the other two 
equations are required to guarantee that g 0 £ Go an d g 1 £ G i ■ 


C. Optimization stage 

To complete the proof it is necessary to explain how A, g 0 , 
<71 and the nonlinear equations can be represented in terms of 


Proposition II.2. Let F and G be two probability measures on 
(Cl, &/) with dF/dG a non-decreasing function. Then, G(y) > 
F(y) for all ygl. 


Proof: Due to a special case of the Fortuin-Kasteleyn- 
Ginibre (FKG) inequality, for any random variable X 
and any two positive increasing functions <p, ip we have 
E (p(X)ip(X) > Fi<p(X)E,ip(X). Applying this to X dis¬ 
tributed according to G and the functions (p '■= l[c,+oo)> 
where lpj is the indicator function, and ip := dF/dG, we 
get G(y ) > F(y) for all y£l. ■ 

Remark II.2. Let X and Y be two random variables defined 
on the same probability space (Cl, si/), having continuous 
cumulative distribution functions P and G, respectively. X 
is called stochastically larger than Y, i.e., X Y, if 

G(y) > F(y) for all y. 


Corollar II. 3. For every non-decreasing function <p, X ysT 
Y <p(X) >-st <p(Y), hence X >- ST Y Y[f(X)} > 
E \<P(Y)] 


Proof of Corollary II. 3 is simple and can be found for 
example in 03 . 


Theorem II.4. Let Xq, Yq, X\, and Y\ be four continuous 
random variables defined on (f2, sY) and having distinct 
densities f 0 , g 0 , / 1; and gi, respectively, with f\/g\, 3i/<7o> 
and go/fo> all being non-decreasing functions. Then, 


D(fi,fo)>D(g 1 ,g 0 ) and D(f 0 , f x ) > D(g 0 ,g{) (29) 
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Proof: By Prop. |II.2| and Remark |II.1| we have Y\ >~st 
Y 0 and F 0 >- S t X 0 since gi/g 0 , and g 0 /f 0 are non¬ 
decreasing functions. Increasing f\/g\ and < 71 /go implies 
increasing fi/go and using Corollary |II.3| and denoting 
0 (Y) = lng 0 //i(F), we have E Xo [<t>(Y)\ > E y 0 [<I>(Y)]. 
Hence, the identity D(f 0 J 1 ) = E x 0 [f{Y)\ + P(/o, 3 o), 
together with E x 0 [f{Y)\ > Ey 0 [</>(Y)], results in £>(/ 0 , / 1 ) > 
D(fo,g 0 ) +D(9 oJi) => D(f 0 ,f 1 ) > D(g 0 J 1 ). It is well 
known that 

31 on 


E Yl 


In 


Aon 


> 0 . 


(30) 


Again, using the Corollary II.3 and denoting 'ij'(Y) = 


-In (z(l u )) - 


1 


:(l u ) 


L, 1/2 InL 


y/iOTidg 


n<i<iu 


+l u In l u / /id/r = e 
Ji>i u ) 


(31) 


where 


z {lu) — f fidg + Zlu 1 / 2 [ \Jfofi 
Jl<l/L, J 1<KL, 


f ld/r 


n<i/i 


+ 1 


-1 


/id/i. 


(32) 


n>i u 


based robust detection scheme much earlier than 0 . From 
0 p.254], it is also known that the work of Dabak can 
be recreated by considering the same minimax optimization 
problem that has been introduced, see ( fT8j ), but changing the 
objective functions Pg and Pg to — D(go , gf) and —D(gi 1 go). 
Here, D is again the relative entropy and (go, 3 i) are the least 
favorable densities. 


- f \ w(y;u) _ w{y ; 1 - v) 

9M = ~kW gi{V>= t(l-v) 

where u, v are parameters to be determined such that 


(33) 


In/i/gi(y), we have Ey-j [t/;(y)] > Ey 0 [f)(Y)], which implies 
—Ey- 0 [t/>(F)] > 0 in comparison with — Ey, [-i/(Y)] > 0. 
We conclude that P(/o,/i) > P(go,/i) together with 
-E Y 0 [ip(Y)\ > 0 implies D(f 0 ,f 1 ) > D(g 0 ,gi). The proof 
for the case P(/i,/o) > D(gi,go) is similar and is omitted. 

■ 

Now, let the likelihood ratio with respect to the nominal dis¬ 
tributions, / 1 // 0 , be monotonically increasing. From Eq. ( [13] ) 
it follows that g\/go and f\/g\ are all non-decreasing func¬ 
tions. Theorem II.4 indicates that D(fi, fi-j) > D(gi, gi^j) 
for j £ { 0 , 1 }, and this implies that go and gi move towards 
each other monotonically. 

E. Symmetric density functions 

Depending on the extra constraints imposed on the nominal 
probability distributions, the equations that need to be solved 
to determine the parameters of the LFDs can be simplified. 
Assume / 0 (y) = fi(—y) for all y £ R and e = e 0 = E\. This 
implies l u = 1 ///. With this assumption Eq. ( [T 6 | ) and Eq. ( fl7j ) 
reduce to 


D(go, fo) — eo, D(g\, / 1 ) — ei. (34) 

Again by 0 , the fixed sample size test in the log domain 




K. >” (^) 


2=1 


H 0 1~{U + V) 


(35) 


is still a likelihood ratio test, but with a modified threshold 
(7 = 1). The following proposition and the proof show that 
go and gi are indeed LFDs without consideration of the 
geometrical aspects of hypothesis testing. 

Proposition II.5. The pair of density functions go and g 1 
satisfy 


go = max E S0 In ( ^ 
goeQo \go 


gi = min E„ In ( — 1 . (36) 
S16G1 91 \go, 


Proof: Consider the Lagrangian function defined in ( fl9| , 
where the objective functions Pg and Ph are replaced by 
E So ln(gi/g 0 ) and E gi ln(gi/g 0 ) <(36). Then, following sim¬ 
ilar steps to (|20|i-(|23)l, it can be shown that go and gi have 
the same parametric forms as given in ( |33j ). The equations 
in © are convex 0 , hence their solution is unique. Since 
( 30 , 31 ) must satisfy <(34) with the same (e 0 ,ei) that (go,3i) 
must satisfy, we have g 0 = go and gi = gi- ■ 

Note that go and gi are denoted as least favorable densities 
only in the sense that they are solutions to the equations in 
( |36| ). In the sequel, the statistical test based on the likelihood 
ratio gi/go will be denoted as the (a)-test. The property defined 
by (]36|) will be used in the next sections. 


The symmetry condition also implies l{y) = 1 /l(—y) and 
l(y) = l/l(—y) for all y. Accordingly, it follows that 
k{h,Q = and g 0 (y) = gi(-g)Vy. Notice that if l 

is monotone, Eq. ( |3T| can be redefined in terms of y u by 
l u = l(y u ), {l > lu} = ( 2 / 11 , 00 ) and due to symmetry 
1 /lu = l(-y u ), {l < 1 /lu} = (- 00 ,-y u ). This proves that 
Theorem |II. 1 1 is a generalization of the results of 0 . 


F. Asymptotically robust hypothesis test 

So far, the problem of minimax robust hypothesis testing, 
for the case where the objective function to maximize was the 
error probability, has been studied. For the same uncertainty 
model 0 - Dabak and Johnson proposed a geometrically 


G. Other distances 

The distance D can be chosen in various ways based 
on mathematical tractability or the practical application fl 8 l . 
Symmetric distances are preferable due to their nice properties; 
for instance, the symmetric version of the relative entropy 
P(/ 0 ,/i) + D(fi, fo). However, this distance does not yield 
an analytic expression for the LFDs and the decision rule as 

I 11 ((g) = W{e Zo5( - v)+Zl ) - W(e ZlS{y)+Z2 ) + z 3 S(y) 

needs to be solved to obtain the decision rule 5(y) for l = 1 , 
where z \, z -2 and z 3 are constants and W is the Lam¬ 
bert FF-function. Symmetrized x ' 2 distance, i.e. X 2 (/o,/i) + 
X 2 (/i,/o), is another example where the LFDs can be ob¬ 
tained analytically. However, the relation between y u and 
l u , and similarly between f and y \, cannot be obtained 
analytically. Another example for a symmetric distance is the 
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squared Hellinger distance. This distance is more appealing as 
it scales in [ 0 , 1 ] and it is mathematically tractable [ 12 ], [ 13 1 . 
For various robust tests, including the relative entropy dis¬ 
tance, the x 2 distance and the squared Hellinger distance, the 
likelihood ratio test is given by <0D- For the symmetrized 
X 2 distance, however, the test is slightly different as Ijl 
is not a constant function for <5 = 0 and 6 (y) = 1 , c.f. 
Sec. |VII| In general, designing a robust test is equivalent 
to determining l = ip(fo,fi) for some suitable functional 
if) which accounts for the unmodeled uncertainties by the 
nominal model while maintaining the detection performance 
above a certain threshold. 


III. Robust Detection for the Composite 
Uncertainty Model 

Minimax robust tests, which are designed based on a 
neighborhood set, where every probability measure belonging 
to the set is absolutely continuous with respect to the nominal 
distribution, e.g. 0, d. are more suitable for modeling 
errors than the tests designed based on a neighborhood set, 
where not all distributions are absolutely continuous with 
respect to the nominals e.g. 0 ; see 0 and 0 - In many 
practical applications, however, both types of uncertainties, 
namely both modeling errors as well as outliers can occur 
and a reasonable approach is to build a single test which is 
uniformly minimax robust. This can be done by combining 
one of Huber’s clipped likelihood ratio tests [9j with a robust 
test which is more suitable for modeling errors. The following 
proposition explains how this can be done. 

Proposition III.l. Let the inner uncertainty set be the ex¬ 
tended version of 0 . i-e- 

Gj = {9j ■ D(gj,fj) < Ej}, j £ { 0 , 1 } (37) 

where D is a convex distance (possibly different for each 
hypothesis), 0 < £j < 1 are some numbers and l = fi/fo 
is a monotone increasing function. Assume that there exist 
So £ Go an d Si € Gi corresponding to probability measures 
Gq and G\, respectively, such that 

Go[si/so < t] > Go[si/so < t] Vf £ R,Vso £ Go 


where qo and q± are the least favorable densities 


qo{y) = (1 - to)go(y) 

for 

gi(y)/go(y) < c u 

= (1/0(1 - e o)gi(y) 

for 

sii(y)/ 9 o(y) > c u 

qi(y) = (1 - £1)51(2/) 

for 

91(5)/90(5) > Cl 

= Ci (1 - ei)go{y) 

for 

91(9)/90(9) < Cl ( 41 ) 


corresponding to Qo and Qi, respectively. 

Proof: The proof follows directly from the definition of 
the uncertainty sets 


E 0 = {Qo : Qo[Y < tf] > (1 - e 0 )G 0 [Y < y] - v 0 } 
E 1 ={Q 1 :l- Qi[Y < y] > (1 - Cl )(l - G X [Y < y}) - j/J 

(42) 

with vo = v\ = 0 for e—contamination neighborhood ( 8 | 
and the stochastic ordering defined by Corollary. |II.3| Only 
the first inequality in ( [40] ) is proven as the second inequality 
can be proven using the same line of arguments. Let b = 
(1 — ei)/(l — eo). Then, for every t > bc u and Q 0 £ Eo, 
the event E = [q\/qo < f] has full probability and for every 
t < bci and Qo £ -To, the event E has null probability. Hence, 
<|40]) is trivially true for these cases. For bci < t < bc u , assume 
that the likelihood ratio Si/so is non-decreasing, which is true 
when l is monotone and the distance is either one of Huber’s 
distances 0 p. 271] or any distance with the likelihood ratio 
given by Eq. |T3] or in general a distance which results in a non 


decreasing l = gi/go for monotone l. Then, by Corollary II.3 
it follows that Gq[Y < t] > Gq\Y < t\ for all t = Z — 1 (j/). Let 
Qo[Y < t] := (1 — eo)Go[Y < f], Obviously Qo[Y < t] > 
Qo[Y < t] for all bci < t < bc u and Qo £ To- Note that for 
non-decreasing qi /On, di /dn is also non-decreasing. Hence, 


again by Corollary II.3 we get Qo[qi/qo <t}> Qo[qi/qo < f] 


for all t and Qo G Eq as claimed. ■ 

The proof is independent of the choice of D as long as the 
LFDs exist. When D is the relative entropy, it follows that 


Golgi/go <p] = 


< 


/ 9odfi + 

/ Sg 0 dg 

' \fh/9a<p] 

/[si/so=p] 

/ 9odfi + 

/ Sgodp 

' [ih/9o<p\ 

t[9i/go=p] 


= Go \gi /go < p) 


( 43 ) 


Gi[gi/go < t] < G\[gi/go < t] Vf £ M,Vgi £ Gi- ( 38 ) 

Define the composite uncertainty sets 

Ej = {Qj\Qj = (1— e j)Gj-\-CjHj, Hj £ , gj £ Gj}, j £ {0, 

( 39 ) 

where S is the set of all probability measures on and 

0 < eo)£i < 1- Then, there exist a pair of LFDs, (Qo,Qi) 
which satisfy the saddle value condition 

Qo[qi/qo <t]> Q 0 [qi/qo <t\ Vf £ K,VQ 0 £ E 0 
Qi[91/90 <t]< Qi[qi/qo <t\ Vf £ K, VQ 0 £ E 0 , ( 40 ) 

if Ej and ej are small enough, i.e., Eq and E\ do not overlap, 


and in a similar way Gi[g±/go < p] < G\\g\/go < p\. This 
proves that the uncertainty sets based on the e-contamination 
model and the relative entropy can be combined into a 
composite uncertainty set ( [39] ) which accepts LFDs, Qo and 
D Qi satisfying ( [40| . Clearly, the same conclusions hold when ;/<j 
and v-\ are non-zero. This includes the total variation distance 
as a special case with eo = = 0. Note that Prop. |V.1| is 

general for all thresholds. However, when the inner uncertainty 
set is the KL-divergence, the decision rule <5 must be used 
to guarantee minimax robustness. For a comparison, one can 
see that the composite model proposed in liTOI is robust only 
against outliers, with some flexibility, while the composite 
model proposed in this work is robust against both modeling 
errors as well as outliers. The LFDs, corresponding to the 
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composite model based on the relative entropy distance, can 
also be obtained as 


%Syfo(y), 

(1 —e 0 ) 1 , 


Ky) < h 


qo( y) = fr'Kv)) 11 h(y), h < l(y) < in 

iu< Kv ) < c u 
Ky) > c u 
Ky) < ci 
ci < Ky) < h 


Qi(y) = < 


1-e ° My), 

(l-6 0 )fc(h,L) f ( v \ 
c u z\h,i u ) J^yy/i 


z{h,l u ) 

(!—ei) 1 


SS Or 1 ^!/)) /i(y). h < Ky) < K 

(1 ~ ei)HllM h(y), 


z(h,l u ) 


Ky) > K 


(44) 


with the corresponding likelihood ratio 

bc -1 , l(y) < Cl 

7) 

h 


ci<l(y)<h 


Ky) = < 


h < Kv) < K 
i u < Ky) 

Kv) > 


i u <Ky)<cu 


be,. 


(45) 


The choice of D can be adjusted depending on the application. 
For instance symmetrized x 2 distance can be preferred if 
the tail structure is expected to be roughly preserved. It is 
also not difficult to see that for variety of distances, {44} 
remains the same. However, special care should be taken for 
the choice of b, since it is equivalent to p. In the sequel, D 
will be assumed to be KL-divergence with the LFDs given by 
unless mentioned otherwise. For example the parameters 
eg = ei = 0 indicate a pure KL-divergence uncertainty set 
with the corresponding LFDs denoted by Qo := Go and 
Qi := G\. In the following, the corresponding test dGi/dGo 
will be denoted as the (m)-test and similarly, the minimax 
robust test for eg = £i = 0 will be denoted as the (h)-test and 
the composite test will be denoted as the (c)-test. 


A. Distribution of the log-likelihood ratios of LFDs 

In order to gain further insights about the minimax robust 
tests and to evaluate their performance, it is desirable to have 
the density function of the log likelihood ratio of the LFDs, i.e. 
h* ~ ln< 7 i/< 7 o(}Q, when Y ~ Qj, as a function of the density 
function of the log likelihood ratio of the nominal distributions 
hj ~ ln/i//o(y) when Y ~ Fj, j G {0,1}. Then, for the 
(h)-test, it follows that 

h* 0) =r°6 x (x - In (bci)) 

“b(l C-i)hi{x 111 b) l{ln(bcz)<a:<ln(6c w )} 

+rjS x (x - ln( 6 c„)), i G {0,1} (46) 


Similarly, for the (m)-test, 

ll T 

K (x) =— j — r -hg(x + \nli)l {x<0} + J x {x) 

l u k{li, l u ), , , , , n 

H-77— r^~bg(x + ln< u )lr x>0 i 

z{h,l u ) 

1 T 

K( x ) = 77 , M {x + hili)l {x<0} + S x (x) 

z{h,l u ) x z{li,l u ) 

+ - 77 — r\hi{x + lnZ„)l{ x>0 } (48) 

Z\H ; t u ) 

where 

f ln(fc(l, .l u )) 

r= (l-H) fdp. (49) 

Jli<KF 

It can be seen that Huber’s test ((h)-test) creates two point 
masses at the clipping thresholds (ln( 6 c;),ln( 6 c «)) and be¬ 
tween them the density of the log-likelihood ratio of the 
nominal distributions is shifted by In b. The robust test based 
on modeling errors ((m)-test), on the other hand, shifts the 
density of the log-likelihood ratio of the nominal distributions 
(hg,hi) by hi 1 / to the right and adds another part of the 
same density, which is shifted by lnZ u , to the left. The total 
loss of area due to the shifting is stacked as a point mass at 
x = 0. 

The equations ( |46| ) and ( |48] > are of particular importance, first 
in calculating the false alarm and miss detection probabilities 
J hg(x)dx and J_, x h*(x)dx, respectively, and second in 
finding the approximate distribution of the test statistic S n = 
£2=1 In l(Yj), for n independent r.v.s Yj, Y 2 , ..., Y n , in terms 
of nominal distributions. However, to calculate the false alarm 
and miss detection probabilities, the factor of randomization, 
S in Eq. {48}, needs to be taken into account. That is, the 
contribution of the point mass at x = 0 to the false alarm and 
miss detection probabilities needs to be determined. 


B. Limiting robustness parameters for the (m)-test 


The composite hypotheses start overlapping when the LFDs 
become identical. For the (m)-test, this occurs when 1Z\ and 
IZg are empty sets. Let u = 1 + ln(fc(i;, l u ))/ha.(l u /h), 
w{y,u) = fi{y) u fo{y) l ~ u and k(u) = f R w(y,u)dy. Then, 
equations {16} and {17} reduce to 


£j(u) 


In k(u) + ^-r -4 
k[u) 


w(y\u) In l(y)dy, j G {0,1} 
(50) 


Proposition III.2. eg is monotone increasing in u and £j is 
monotone decreasing in u. Hence, 0 < £g < D(f±,fg) and 
0 <£l<L>(/o,/ 1 ). 

Proof: For j = 0, it follows that 

e{u) = - In (fc(u)) + [ Ky) u HKv))fo(y)dy 

k[u) J R 

After manipulation, the first derivative of e(u) is 


where 5 X is a dirac delta function and 

Cg = (i - eo)Fg[f 1 /fg < Cl], ro = —(1 - e 0 )Li[/i// 0 > c u \ 

Cu 

r° = ci{ 1 - ei)F 0 [fi/fg < cj], r\ = (1 - ei)Fi[/i // 0 > c u ] 

(47) 


de(u) 

du 


u 

k(u) 2 

dk(u) 

du 


k{u) f f 0 (y)Ky) u MKy)) 2d y 

Jr 

I Ky) u My) MKv))dy ■ 

Jr 


(51) 
































Inserting k(u) and dk(u)/du and rearranging the terms yields 

k(u) 2 de(u) 


udu 


= / l{y) u fo(y)dy / l(y) u f 0 (y)\n(l(y)) d y 
J r Jr 

- [ Ky) u fo{y)MKy))dy [ Ky) u fo(y)H l (y)) d v 

Jr Jr 

= j w(y;u)dy J w(y;u)\n(l(y)) 2 dy 


w(y;u) ln(l(y))dy 


(52) 


By Holder’s inequality, w(y; u) is integrable over R. Consider 
the weighted L 2 space, Lj^(R) equipped with the inner product 


= J R 9(y)Hy)w(y)dy 

J R w{y)dy 


(53) 


and the resulting norm He/]]™ = \J ( g , g) w . By definition, g 
is in L 2 w if g 2 w is integrable over R. Let g(y) = In (l(y)). 
Dividing ((52) by (f R w(y)dy) 2 reads 


k(u) 2 de(u) 

u(f R w(y)dy) 2 du 


Ml >v 


(54) 


The inequality follows from the Cauchy-Schwarz inequality 
for the inner product space (g, h) w and it is strict since g and 
1 are linearly independent. What remains to be shown is that 
g belongs to L 2 W , i.e., f R g(y) 2 w(y)dy < oo. If g is bounded, 
the claim is obvious. If not, then, either limM^^ l(y) = oo 
or limij^-oo l(y) = 0. Assume lim y _ ) . 00 l(y) = oo and write 


In (l(y)) 2 w(y) = (ln(l(y))) 2 l(y) u f 0 (y) 

i{y) 2 

(In %)) 2 , , ,J±s , ,,i= 2 . 


Ky) 


T^rfiiv) 2 fo(y) 2 


(55) 


By Holder the function fi(y) 2 fo{y) 2 
since 


lim 

|y|—>00 


(ln%)) 2 

l(y)^ 


= 0, 


is integrable and 
(56) 


g(y) 2 w(y) is integrable over [ 0 , 00 ) by comparison with 
/i(y) nfH /o(y) _iH . If iim ^-00 l(y) > 0 , then g is bounded 
on (—00, 0] and integrability over (—00, 0] follows. If 
lim^oo l(y) = 0 , then as 

In (l(y)) 2 w(y) = ( ln(l(y))) 2 l(y) u f 0 (y ), (57) 


we have 

lim = (ln(l(y))) 2 l(y) u = 0, (58) 

y ->-00 

and integrability over (— 00 , 0 ] follows by comparison with 
fo- In a similar way, g{y) 2 w(y) is integrable over R if 
lim y _ ) ._ 00 l(y) =00 or lim^oo l(y) = 0. This completes the 
proof that de{u)/du > 0 and hence e 0 < £o(l) = D(fi,fo). 
For E\, let u = 1 — u, fi := /q and f 0 := /). This gives 
£i(u ) = £q(u), which implies that £i(it ) is increasing, 
therefore £i(u) is decreasing. Note that for £\, with the 


substitutions of the densities, l becomes decreasing, however 
g still belongs to L 2 W and the proof is complete. ■ 

Prop. ( |III.2[ i implies that ( [50| has a unique solution for all 
£j < fj), j £ { 0 , 1 }- In particular, given a certain 

choice of £j, the solution of Eq. © leads to 0 < u* < 1 . 
The corresponding maximum £i-j is therefore obtained by 
£1 From ( [50] ), it also follows that 

s 0 (u) - £!(u) =-J— [ l(y) u f 0 (y)ln(l(y))dy, (59) 
K \ u ) Js. 

which is bounded as -D(f 0 , fi) < £ 0 {u)-£i{u) < D(fx, f 0 ) 
due to monotonicity. When e = £ 0 (zt) = £i(it), this reduces 
to 

£= sup —In/ h(y) u h{yY~ u dy (60) 

0 <U <1 J R 

which is the Chernoff distance and if additionally fo(y) = 
fi{~ 2/)Vy, it further reduces to 

£ = — In / \Jfo(y)fi{y)dy, (61) 

Js. 

which is the Bhattacharyya distance between the nominal 
densities. 


C. Limiting robustness parameters for the (h)-test 

Proposition III.3. The maximum achievable pair of (eo,ei) 
with respect to the e-contamination model are obtained by 

(1 - e 0 )(P 0 [/ < 1/6] - 6 Pi[Z < 1 / 6 ]) = Cl (62) 

where 6 = (1 — ei)/(l — eo). 

Proof: By Huber 0, it is known that hi(ci) = 
Pi[Pi/Po > c-i] + ciPolpx/po < ci] is an increasing function 
of ci and h 2 (c u ) = P 0 [pi/po < c u \ + (1 /c u )Pi[pi/p 0 > c u ], 
in a similar manner, is a decreasing function of c u . This 
implies that eo an d e± are maximized when c/ is maximized 
and c u is minimized. The maximum of q is equal to the 
minimum of c u such that the hypotheses do not overlap. 
As a result for c = c; = c u , it follows that l(y) = be 
for all y £ R. Since no density is greater than any other 
for all y £ R, the conclusion is that c = 1/6. Rewriting 
the equations, hi(c := 1 / 6 ) = 1/(1 — ei) or equivalently 
h 2 {c := 1 / 6 ) = 1/(1 — eo), completes the proof. ■ 

Let u 0 = essinf^j Z, u± = esssup^i Z, and let k = 1 — eo 
be known and u = 1/(1 — ei) to be determined. With these 
substitutions ( |62| > can be written as f(u) = kuPo[l < ku\ — 
P\ [l < ku] — u + 1. 

Lemma III.4. The function f is continuous, f(u) = 1 — u 
for 0 < ku < uq, is strictly decreasing for ku < u\, tends to 
—00 for k < 1 and tends to 0 for k = 1 as u f 00 . 

Proof: we have 

f(u) =kuPo[l < ku] — Pi[Z < ku] — u + 1 

= / (ku — l)podp, — u + 1 . 

J {l<ku} 


(63) 
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Thus, 

f(u + A) — f(u) = ( (ku + kA 

J {fcu<Z</c(u+A)} 

+ A Ik p 0 dp - 1 ) . 

\ J J 

It then follows that 

f(u + A) — f(u) < kAPo [ku < l < k(u + A)] 

+ kAP 0 [l < ku] — A 
= A (kP 0 [l < k(u + A)] — 1) < 0 (65) 

for any positive A and for all u > 0. Since f(u + A) — 
f(u) > A(kPo[l < ku] — 1 ) > —A, the conclusion is that 
0 > f[u + A) — f(u) > —A from where continuity and 
monotonicity follow. For ku < k(u + A) < u±, it also follows 
that f(u + A) — f(u) < 0, hence, f(u) is strictly decreasing, 
tends to —oo for k < 1 and tends to 0 for k = 1 as u f oo. 


- l)podfJ, 
(64) 


Lemma [III. 4| implies that ei can be determined uniquely for 
all 0 < Co < 1. Moreover, Lemma [lll.4| extends to the case 
when ei is known and eo is variable due to the duality of the 
parameters, e (l and e-\. 


IV. Fixed sample size tests 


The robust version of the likelihood ratio test with respect to 
the uncertainty model o can be generalized to n independent 
samples, i.e. 

ky) = f[Hvi) ^7. (66) 

r=i 

which is equivalent to the nominal likelihood ratio test 


” «i // N S 


=i Si(v) 


(hr 7 


and similarly in the logarithmic scale 


(67) 


i—1 


n i 

^ n1n.li, 
Ho 


( 68 ) 


for 7 = 1 . Given the upper and lower thresholds, li and /,,, if 
S"=i ~ the original threshold of the nominal test is 
moved from 0 to nln/;, increasing the false alarm probability. 
Similarly, if YT=i Hlli) ~ the original threshold of the 
nominal test is moved to n In l u , which increases the miss 
detection probability. Let y = [y -\, •... y n ] be the observation 
vector and 6 = 1. Assume that there are n\ (and 712 ) 
observations in y whose likelihood ratios are clipped to (c/) 
(and c u ), respectively. Then, Huber’s clipped likelihood ratio 
test can be represented in the log domain as 


n— m— ri2 n, 

E 7~l 1 

1u(l(y z )) ^ -(n 1 \nc u +n 2 1nci). (69) 

i=i 

Eventually, the robust test based on the composite model ( |39| ) 
can be given by 


where n\ and 712 are now due to clipping of the likelihood ratio 
given by The composite test combines the robustness 

properties of both the clipped likelihood ratio test as well 
as the robust test for modeling errors. Single sample robust 
tests are extended to multiple samples through multiplication 
of the likelihood ratios due to the independency of every 
measurable set of observations. Unlike Huber’s robust test, 
there is no stochastic ordering for the LFDs of modeling errors. 
Hence, the composite model can be expected to be robust, but 
minimax robustness is not guaranteed for 71 > 1 . 

A. Asymptotic performance analysis 

Large deviations theory can be used to analyze the asymp¬ 
totic performance of the robust tests. Consider the following 
theorem by Cramer CD: 

Theorem IV.l (Cramer). Let (Ti)j> 1 be a sequence of i.i.d. 
random variables, S n = 4 ^ be their average sum and 

My 1 (u) := E[e u ^ 1 ] <00 be the moment generating function 
of the r.v. Y\. Then, for all t > E\Y\ ] 

lim —In P(S n > t) = —I(t) ( 71 ) 

n—f 00 77 , 

where the rate function I is defined by 

I(t) sup ( tu — In My 1 {u)) , (72) 

U 

which is the Legendre transform of the log moment generating 
function. 

Remark IV.l. Theorem |I V. 1 1 implies 

lim —In P(S n < t) = —I(t) ( 73 ) 

n—f 00 77 

for all t < E[Yf. To see this, take A, = —K, and consider 

p („!>>-*)• ^ 

Applying Cramer’s theorem to the r.v. Xi and the threshold 
—f, it follows that My 1 (u) = Mx 1 (— u) and 

I(t) := sup ( tu — In My 1 (u)) = sup (—tu — In Mx x (—tt)) 

U U 

( 75 ) 

Let S n : = ^ E"=l ln k Y i) with k Y i) = Qi( Y i)/Qo(Yi) for 
Y, ~ Q 0 under 'Ho and for Y, ~ (f\ under 'H\ for all i £ 
{0,..., 7 )}. Furthermore, let the first and second type of error 
probabilities defined to be P%(t) = P(S* > t) and P^(t) = 
P(S* < t). Then, for all E Q fl(Y A )] < t < Eq, [l(Y t )] from 
Theorem |I V. 1 1 and Remark |I V. 1 1 

lim - In P J E (t) = -Ij(t) j = 0,1, (76) 

n —»oo 77 

where 

Ij(t) := sup (tu — InM^u)] j = 0,1, (77) 


n—n 1 — 712 


£ In 


i=1 


H 1 

^ -(711 In c u + n 2 Inc*). 

Ho 


with 


= / Ky)ij(,y)dy j = o, 1 . 


(78) 


(70) 
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Remark IV.2. Interestingly, if q\ = q\ and qo = qo , 
the parametric curve (eo(u),£i(u)) for 0 < u < 1, {50} 
with / 0 := qo and /) := qi implies (/ 0 (i),/i(i)) for all 
Eq 0 m )] < t < Eq 1 [?(Y})]. To prove this claim, observe 
that in this case we have My0u) = My^u + 1). Applying 
this result to {77} , taking the derivative of tu — In My (u) with 
respect to it, and rewriting I t in terms of maximizing u gives 
{50} with the aforementioned substitutions. Since the mapping 
from 0 < u < 1 to Eq 0 [1(Yi)] <t< Eq, [Z(Y,)] is bijective, 
as the derivative of a convex function In My (it) J2] p.77] is 
increasing, the proof is complete. 


B. Limiting tests 

1) Limiting (m)-test: The limiting case, lim; ( ^ in f; and 

lim/ u _,. SU p;, is of particular interest. For a single sample, 
the test becomes a pure randomized test having a success 
probability 6 which increases with l (hi}. For n independent 
samples, assume /; := 1 jl u and consider the normalization 
Inf (y) = (lnZ„ — lnl(y))/(lnl u — Ink). Then, as k j, 0 and 
l u t oo, the test statistic In l n (y) = T= l ^ (l/i) tends to 
5Z”=i which is the soft version of the sign test. 

2) Limiting (li)-test: The limiting test for Huber’s clipped 
likelihood ratio test is known to be the sign test 0. 

3) Limiting (a)-test: The limiting asymptotically robust test 
is again a likelihood ratio test with the threshold determined 
by u + v —> 1 in {35} . 

V. Robust sequential probability ratio test 

Sequential probability ratio tests (SPRT)s can be preferable 
over fixed sample size tests due to their strong optimality 
properties CD- Let S n = 5Z™_i Y). Then, for given target error 
probabilities of the first and second kind, a and ft respectively, 
by Wald (20|, there exist an upper threshold tu 1 and a 
lower threshold 0 < ti < 1 such that SPRT continues taking 
another sample if t[ < S n < t u , terminates and decides for 
n 0 if s n < ti and decides for the alternative hypothesis H\ if 
S n > t u , for the first time N = min{n : S n > t u or S n < t{\. 
Furthermore, let the binary r.v. v denote the decision of the 
sequential test, i.e. v = 1 to decide for Hi and v = 0 for 
Ho- Similar to the fixed sample size test, a robust version 
of the sequential test can be defined in terms of the nominal 
likelihood ratios and modified thresholds 


test is a design of two (random) functions mo and m\ such 
that both the expected number of samples, E[N], as well as 
the error probabilities of the first and second kind (a, 0) are 
bounded from above for all probability measures in the vicinity 
of the nominal distributions defined by a neighborhood of 
uncertainty. In the following, the robust tests that have already 
been designed or introduced are analyzed for the sequential 
test. Throughout design or analysis of a robust sequential 
test can be found for example in ED, where the probability 
distributions are assumed to be discrete with finite set of 
values, or in El, where Huber’s test is rigorously shown to 
be asymptotically robust. 

Let S n = ZZr=i t> e the test statistic where 

Yi, Yjjj • • • j Y n are again i.i.d. and lnZ(Yi) follows a prob¬ 
ability distribution Qj, which accepts a continuous density 
function q 7 , when the true hypothesis is LLj, j £ {0,1}. Let 
furthermore 

hj, n (y) = ^Pj[S n < y, Si,... ,S n - 1 £ (Inf;, In t u )\ (81) 

be the density function of S n under Hj, j £ {0,1} when all 
Sk, k < n are in (In L, In t, u ). Hence, the distribution of N 
can be calculated recursively by 


PjW = n] 


hj,n{y) 


' (—oo,ln tj)U(ln t u ,oo) 


hj tn (y)dy 



hj, n -i(u)qj{y 


uj)dui 


(82) 


with the initial condition hj t i = qj, j £ {0,1}, (23j. 
Accordingly, it follows that 


OO 

E j [IV] = nP j[N = n]. (83) 

71—1 


Slightly modifying Pj [IV = n] by imposing the constraint that 
the test will terminate either with the rejection or acceptance 
Ho, 

P 0 [N = n\v = 1] = [ h 0 ,n(y)dy 

J (In l u , oo) 

Pi[N = n\v = 0] = f hi >n (y)dy, (84) 

J (—oo,ln li) 

we get 


m(yi) In f < Y lnl (yi) < m(t/i) lnf„ (79) 


m(yi) = In h I n-YHyi) ) + ln luYJ(.yi)> ( 8 °) 

V i=t / i= 1 

for the (m)-test. Extensions to the (h)-test as well as to the (c)- 
test for the function m follow in a straightforward manner from 
( |69l > and {70}. However, it can be observed that all three robust 
tests are still some subsets of a possible design which considers 
two possibly different functions mo : O i —>• M and mi : fl i— > R. 
as multiplicands to the lower and upper thresholds. Hence, it 
can be concluded that a general design of a robust sequential 


OO 


OO 


a = Y, Po[N = n\v 

71=1 


i], p = Y P A N = n \ v 

71=1 


0 ], 

(85) 


Herein, a, f3, and Ej[N] are all implicit functions of ( ti,t u ), 
and Qj, j £ {0,1}. When the notations are made explicit, a 
minimax robust sequential test must satisfy 


CtQ 0 [tl,t u \ > (XQ a \ti,tu\ 

Pq! \pl ! t U \ > /?Ql [tl, tu\ (86) 

E Qo [ N > tl ’ tu \ - E Qo [ N > tl > *«] 

E Qi [N;ti, t u ] >E Ql [N;ti,t u \ (87) 
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for all (Qo,Qi) £ Qo x Si and for all (ti,t u ). 

The sequential (m)-test does not satisfy ([86]) and ( |87| > even 
asymptotically, i.e. when i; —» 0 and t u —> oo, or equivalently 
Q o —> Q i or Q i —► Qo- This is due to the lack of stochastic 
ordering between Qo and Q o, likewise between () \ and () \. 
Similarly, the sequential (a)-test m does not satisfy ( |86| ) 
(asymptotically) either. Again, asymptotically, the behavior of 
the cumulative sums are determined by their non-random drift, 
i.e., S n ~ (V.Eq [In Z(Y)] or S n ~ NEq [InZ(Y)[ and Wald’s 
approximations become exact, i.e., E[S n ] « Inf/ under Ho 
and E[S n ] ~ In t u under Hi. Combining both conditions, it 
follows that 


EqoI^V] 


Inf/ 

Eq 0 [In f(Y)]’ 


E Ql [N] 


In t u 

E Ql [In f(Y)[ 


( 88 ) 


From ( |36| , it is known that ( [33] ) maximizes the right hand 
sides of ( |88| >. Therefore, the sequential (a)-test satisfies ( |87| ) 
asymptotically. 

For the sequential (h)-test, it is known that ( [86| and © 
are satisfied asymptotically 0. Additionally in ED, a 
counterexample is given, which shows that ( |87| ) does not hold 
in general, i.e., for all (f/,f u ). In the following, it is shown 
that the sequential (h)-test satisfies ([86]) for all (f/,f u ). 


Theorem V.l (Coupling). Let (X , Y) be a pair of random 
variables on (ff, s# , P) with X )^st Y. On the same probabil¬ 
ity space there exist another pair of random variables (X , Y ) 
such that X = X in distribution, Y = Y in distribution and 
X > Y almost surely. 

Proof: Take X' = X and Y' = G~ 1 (F(X)). Then, 
X' = X in distribution, P[G~ 1 {F(X)) < x] = P[F(X) < 
G(x)} = P[X < F~\G {*))] = F[F _1 G(x)] = G(x) =: 
P[Y < x], so Y = Y in distribution and since P[Y > X ] = 
P[G~ 1 F(X) > A'] = P[F(X) > G(A)[ = 1, X' > Y' 
almost surely. ■ 

Proposition V.2. Let Xi and Y t be two continuous random 
variables on R. having distribution functions F and G, respec¬ 
tively and satisfying G(y) > F(y) for all y. Furthermore, let 
S* = X)”=i ^i, Ei, A > 0, and B < 0. Denote 

ta = inf{?r > 0 : S n > A } and tb = inf{n > 0 : S n < B} 
the hitting/stopping times of S n at the upper and lower 
thresholds respectively. Then, 


P s * [ta > t b ] > P S v [t a > t b \. 


(89) 


Proof: For a well defined comparison, exclude the cases 
X ^ 0 and Y ^ 0 s.t. at least ta < oo or t b < oo 
almost surely and t,\ > t b is well debited. The argument 


G(y) > F(y) for all y implies X >~ st Y and from Prop. V.l 


there exists (A' , Y ) such that X = X, Y = Y in distribu¬ 
tion and X > Y almost surely (a.s.) Consider the sequence of 
i.i.d. random variables (X n ,Y n ) n >i s.t. {X 1 ,Y 1 ) = (A , Y ) 
in distribution. Then, (AJ„>i = (X n )„> x and (Y r [) n >] = 
(E„)„>i in distribution. Debning S* = A' i and 

i E/, we also have S£ = Sf and S^ = Sj in 
distribution. Since X > Y a.s. and accordingly X i > Y i a.s. 
for all i, > 5 \ a.s. Let = inf{n > 0 : S* > A} 


and debne rj , t b and t\ in the same way. Then S% > A 
implies Sf > A for all n, so t\ < t\ and in the same 
way t§ > Tg . Hence, 

p (S*){ta < t b ) = P(ta <t b )> 

P(Ja' <t b ) = P(sx)(t a < t b ) (90) 


Let X ~ Q o and Y ~ Qq, likewise X ~ Q\ and Y ~ Q\ 
with A = lnf„ and B = Inf/. Then, it is easy to see that ( |90| ) 
is equivalent to ( |86| ) for any pair (f/, f„). This result includes 
not only the (h)-test, but also all tests in (8j, 0. 

For the expected number of samples, the requirement is 

E[ min-jrf ,t£ }] > F[min{r^ ,t\ }]. (91) 

This inequality does not have to hold in general. Intuitively, 
however, it is expected that it holds for the majority of the 
cases, especially when f/ is small enough and t u is large 
enough. 


VI. Robust estimation 


The composite uncertainty model given in equation ( |39| ) 
extends to robust estimation problems. Let fe be a nominal 
probability density function corresponding to the distribution 
function F with parameters 0 = [0-\, ()■>, .... Vv]. In a general 
estimation framework, some parameters, possibly a sub-vector 
of 6 can be estimated well whereas some other parameters 
might not be, possibly due to a fast change of the parameters 
with time or due to the random nature of the parameters 
whose distributions are unknown. It is also possible that the 
known parameters might deviate slightly from the true values 
depending on the nature of the application or without regarding 
the parametric model, the shape of the distribution might be 
slightly different than expected, e.g. when there is lack of data 
but the CLT is assumed. In such cases, we have modeling 
errors that go unmodeled in addition to the outliers caused by 
some unexpected events. Therefore, it is desirable to design 
robust estimators which are not only able to deal with outliers 
but also with modeling errors, as given by m 
To account for the composite model, let T n (F) be a functional 
T n : Y n 4 K of R n -valued random variable Y" with 
i.i.d. replicas following a certain distribution F, i.e., for 
Y n = [Yi, Y 2 ,..., Y n \ each pair of r.v.s (Yi, Yf) with i f k 
are i.i.d., having a distribution function F. Then, it is desirable 
that linin^oQ T n (F) = 6 for some parameter 9 when F is 
the nominal distribution. Let Fr n and (fr„ be the distribution 
functions of T n when F and Q are the distribution functions of 
Y x , respectively. Then, it is also expected that for every e > 0, 
there exist <5 > 0 and an no > 0, such that for all n > no 
and Q £ IF, D(FT n ,QT n ) < e whenever D(F,Q ) < 6 for 
some metric D. This is a straightforward extension of Ham¬ 
pel’s equicontinuity theorem of robustness for the composite 
uncertainty model. Accordingly, the influence function can be 
modified as 


!F(y,T) 


lim sup 

e -*-°GeS 


T((l-e)G + eS x )-T(F) 


e 


(92) 
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Fig. 1. Least favorable density functions, qo and qi, for {eo = 0.15, e\ = 
0.05, eo = 0.02, ei = 0.02} together with their corresponding nominal 
density functions, /o and f±, for Fq ~ J\f(— 1,1) and Fi ~ A/*(l, 2). 

to account for the modeling errors in addition to the outliers. 
Similarly, the maximum bias as being another important metric 
to measure the robustness of an estimator can be obtained as 

6 (e) =sup \T(Q) — T(G)| 

Q,G 

= sup |T((1 — e)G + eH) — T(G)|. (93) 

G,H 

VII. Simulations 

In this section, simulations are performed in order to vi¬ 
sualize and validate the theoretical findings. Observations are 
assumed to be real valued. The formulations are general, there¬ 
fore, the observation space can be any discrete, continuous, 
finite or infinite set, with slight modifications for the discrete 
case. It can also be extended to the multidimensional case, but 
for large n, Monte-Carlo simulations may be required in order 
to solve the non-linear equations, c.f., 

In the first simulation, the composite uncertainty model {39} 
with mean and variance shifted nominal distributions, Fq ~ 
Af( — 1,1) and F\ ~ 7V(1,2), and the uncertainty parameters 
{eo = 0.15, £i = 0.05, eo = 0.02, ei = 0.02} is considered. 
Note that for this choice of nominal distributions, neither 
dFi/dFo is monotone nor they are symmetric with respect 
to any point on their domain or codomain. In addition to 
this, £q ^ c | is chosen so that the given example is general 
enough for the solution of Equations {16} and {17} . Regarding 
the e—contamination part of the composite model, eo = t \ is 
chosen to be consistent with p = 1 for the uncertainty model 
based on relative entropy. Accordingly, in Fig. [T] the LFDs 
together with their nominal distributions are shown, whereas 
in Fig. [2] the log-likelihood ratios of the nominal distributions, 
the least favorable densities ( 30 , 51 ) when {eo = 0.15, e, = 
0.05, eo = ei = 0} and the least favorable densities (qo,Qi) 
when {eo = 0.15, £1 = 0.05, eo = 0.02, ei = 0.02} are 
shown. 

In the second simulation, the mean shifted Gaussian distribu¬ 
tions Fq ~ Af(— 1,1) and F\ ~ JV( 1.1) are considered when 
the closed balls are formed with respect to the symmetrized 
X 2 distance with (eo = £1 = 0.08) and a relative entropy 
distance D with (eo = £1 ~ 0.0087). The parameters are 


l 



Fig. 2. Likelihood ratios of the nominal density functions, fo and f±, least 
favorable density functions, go and gi, with {eo = 0.15, £\ = 0.05} 
and composite least favorable density functions, qo and q\, with {eo = 
0.15, ei = 0.05, e 0 = 0.02, ei = 0.02} for F 0 - A/*(-l, 1) and 
F\ rsj A/"(l, 2). 



Fig. 3. The ratio of the likelihood ratio of the least favorable densities 
l = gi/go to the likelihood ratio of the nominal distributions l = fi/fo 
for Fq ~ J\f(— 1,1) and F± ~ A/*(l, 1), when the LFDs are based on the 
symmetrized x 2 distance with e = eo=£i~0.08 and when the LFDs are 
based on the KL-divergence with e ph 0.0087. 


chosen such, such that the LFDs resulting from both distances 
have equal relative entropy relative to the nominal density 
functions. Figure [ 3 ] illustrates ///, the ratio of the likelihood 
ratios. It can be seen that there is a significant difference when 
the x 2 distance is considered instead of the KL-divergence. 
While this ratio tends to 1 as 5 —»• 0 and 5 —> 1 for the 
symmetrized x 2 distance, meaning that the tails of the density 
functions are preserved, it is a constant h < 1 when <5 = 0 and 
another constant l u > 1 when 6=1 for the KF-divergence. 
In the third simulation, again the same mean shifted Gaussian 
distributions are considered. Of interest is the curvature of the 
maximum robustness parameters for the (h)-test {62} versus 
the (m)-test {50}. Figure [4] illustrates the outcome of this 
simulation. 

In the fourth simulation, asymptotic decrease rates, Iq and 
h {73- of the type I and type II error probabilities are 
considered. The log-likelihood ratio test is built based on 
FFDs of the composite model l(Y) = qi{Y)/q 0 (Y) with 
parameters £o = £i = 0.01 and eo = e± = 0.01. The 
r.v.s Yq, Yl, ..., Y„ , which are consistent with the observations 
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Fig. 4. Maximum achievable robustness parameters with respect to the (h)- 
test and the (m)-test when the nominal distributions are Fq ~ A’f— 1.1) and 

F] ~ jV(l, 1). 


yo, yi, • ■ •, y n , are i-i-d. The simulation is performed for six 
different distributions of Y\ for each hypothesis. Under Hj, Yj 
is distributed as one of the following distributions: the nominal 
distribution F t denoted by (n), LFD Q, with parameters 
£o = Ei = 0.01 denoted by (m), LFD Qi with parameters 
e 0 = e i = 0-01 denoted by (h), LFD G} of the asymptotically 
robust test with £o = £i = 0.01 denoted by (a), LFD of the 
composite model Qj with parameters £o = £i = 0.01 and 
£o = £i = 0.01 denoted by (c), j £ {0,1}. For comparison 
reasons, the sixth LFD is introduced with respect to the 
composite uncertainty set. The LFD of the asymptotically 
robust test Gj for £o = £\ = 0.01 are first obtained. Then, Q t 
with eo = ei = 0.01 is determined when Gj is the nominal 
distribution, j £ {0,1}. This test is denoted by (c*). Figure [5] 
and Fig. [6] illustrate / (J and I \ when Y\ follows various 
distributions, as described above. The notation \ b a indicates that 
the robust test is performed by the LFDs of the (a)-test and 
the observations follow the LFD of (b)-test. In general, the 
composite test is not claimed to be asymptotically minimax 
robust since the LFDs of the (m)-test are not asymptotically 
robust. However, for this example, the (c)-test asymptotically 
does not degrade its performance for all observation models, 
when t is small enough in its allowable limits. This test 
corresponds to the type I Neyman-Pearson test, cf. 12. 

In the fifth simulation, a single sample (m)-test © is 
considered, when the nominal distributions are mean shifted 
and mean and variance shifted Gaussian distributions as de¬ 
fined before. Robustness parameters are chosen to be equal 
(£ = £o = £i). For this choice, from ([50}, it follows that 
£ £ [0,0.5] for the mean shifted Gaussian distributions and 
£ £ [0,0.338] for the mean and variance shifted Gaussian 
distributions s.t. the LFDs do not fully overlap. For all possible 
choices of £, the performance of this robust test was calculated 
when the observations are due to LFDs of the (m)-test ( [13} 
and the LFDs of the (a)-test {33}, which are determined 
for the same £ of the robust test. The rationale behind this 
simulation is to test the minimax property defined by {9} and 
{TO} . The choice of the (a)-test as a competitor to the (m)- 
test is not arbitrary. First, the LFDs of both tests lie on the 
boundary of the closed ball and second, the (a)-test is claimed 


h 



Fig. 5. Asymptotic decrease rate Iq of the composite test when the observa¬ 
tions follow the nominal distributions Fq ~A/*(— 1,1) and F\ ~ A/*(l, 2), 
LFDs of the (m)-test, LFDs of the (a)-test, go and g i, with £o = £i = 0.01, 
LFDs of the (h)-test, qo and qi, with {eo = e± = 0.01, eo = £i — 0}, 
LFDs of the (c)-test, go and q±, with {eo = ei = 0.01, eo = £i = 0.01} 
and LFDs of the (c*)-test . 


h 



Fig. 6. Asymptotic decrease rate I\ of the composite test when the observa¬ 
tions follow nominal distributions Fq ~ J\f(— 1,1) and Fi ~ A/*(l, 2) LFDs 
of the (m)-test, LFDs of the (a)-test, go and gi, with £q = e± = 0.01, LFDs 
of the (h)-test, qo and q\, with {eo = ei = 0.01, eo = £i = 0}, LFDs of 
the (c)-test, qo and qi, with {eo = ei = 0.01,eo = ei = 0.01} and LFDs 
of the (c*)-test . 


to be asymptotically robust for large enough n 0. Figure [7] 
illustrates the outcome of this simulation for the mean shifted 
Gaussian distributions. Due to the symmetry of the nominal 
distributions and the equal choice of the robustness parameters, 
we have Pe = Pe = Pe- It can be seen that the robust 
test doesn’t degrade its performance as expected. Similarly, 
in Fig. [8] the result of the same simulation for the mean 
and variance shifted Gaussian distributions is given. Since the 
nominal distributions are not symmetric, the error probabilities 
(Pg and Pp) are unequal. More interestingly, as illustrated in 
Fig. [9] the false alarm probability first increases with £ and 
then starts decreasing. In all cases, it can be seen that {2 and 
© are valid. 

The last part of the simulations is related to the robustness 
of the sequential probability ratio test based on the likelihood 
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Fig. 7. The performance of the robust decision rule (HI for all achievable 
least favorable distributions of the (m)-test and the (a)-test with e = £o = ei, 
when the nominal distributions are Fg ~ A/"(— 1 . 1) and F\ ~ jV(l, 1). 


ratio between LFDs obtained by single sample robust tests. 
The robustness of the composite model strictly depends on 
the robustness of each single model: the sequential (m)- 
test and the sequential (h)-test. If one of them fails to be 
minimax robust, then the composite model is not minimax 
robust either. This makes the analysis of the test of robustness 
for the sequential (m)-test and the sequential (h)-test general 
enough to have conclusions about the composite test. In the 
sequel, Monte-Carlo simulations have been performed with 
10 5 samples. The threshold space (lnf;,lni M ) £ R - x R + is 
first cropped to [—6, 0] x [0,6] and then discretized with a step 
parameter of 0.01 in both directions, leading to 60 x 60 pairs 
of (In ti, In t u ). The nominal distributions are selected to be 
the mean and variance shifted Gaussian distributions as before. 
For e = £q = £i = 0.01, the LFDs of the (m)-test (<7o, <?i) and 
the (a)-test (<?o>ffi) are determined by solving ( | 1 6[ i, ( fTTfr and 
Accordingly, the likelihood ratio is formed by l = gi/go 
or l = gi/go- The tests considered are S n = K^i) and 
S n = i K^i) where every Y, is distributed either as go or 
g 0 under Ho and either <j \ or <) \ under H \ . For every pair of 
thresholds (In ti, In t u ), the sequential test is run and the false 
alarm probability, miss detection probability and expected 
number of samples under Ho and Hi are calculated. Figure [T0| 
illustrates the ratio of the false alarm probability to the 
false alarm probability ct™. Clearly, the performance of S n for 
Yi ~ g 0 degrades for almost all simulation points if actually 
Y ~ g 0 . Figure 11 illustrates similar results for the miss 
detection probability when the robust test is S n = ]C" =1 l(Yi). 
Again, the test doesn’t satisfy the bounded error probability 
condition. Figures [~i~2}| 1 5| illustrate the same type of simulations 
for the expected number of samples where similar observations 
can be made. In conclusion, one can see that the sequential 
(m)-test is not robust for the error probability as well as for 
the expected number of samples, whereas the sequential (a)- 
test is only asymptotically robust for the expected number 
of samples. The simulation results are in agreement with the 
theoretical findings. A short comparison of the (m)-test, the 
(a)-test and the (h)-test are given in Table [I] 


1 ^, 4 } 



Fig. 8. The performance of the robust decision rule G3 for all achievable 
least favorable distributions of the (m)-test and the (a)-test with e = £o = £i> 
when the nominal distributions are Fq ~ A/*(— 1,1) and F\ ~ A/*(l, 2). 



Fig. 9. False alarm probability of the robust decision rule for all 
achievable least favorable distributions of the (m)-test and the (a)-test with 
E — £q — ei, when the nominal distributions are Fo ~ A r ( — 1.1) and 
F\ ~ A/”(l, 2). 


VIII. Conclusion 

A minimax robust hypothesis testing scheme between two 
composite hypotheses based on the KL-divergence has been 
proposed. It has been shown that the proposed model reduces 
to Levy’s robust test 0 when the nominal likelihood ratio is 
monotone and the nominal probability density functions are 
symmetric. For comparison purposes, Dabak’s asymptotically 
robust test 0 has been introduced and the existence of LFDs 
for this test has been proven without consideration of the 
geometrical aspects of hypothesis testing. It has been shown 
that the proposed minimax robust test, the (m)-test, can be 
combined with Huber’s clipped likelihood ratio test, the (h)- 
test, in a composite uncertainty model. Hence, the composite 
test, the (c)-test, provides minimax robustness both for outliers 
as well as for modeling errors. The existence of LFDs for 
the composite uncertainty model has also been proven. It has 
been demonstrated that the proposed composite model reduces 
to the individual robust tests via a suitable choice of the 
parameters. 

To design a robust test for modeling errors, the uncertainty sets 
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t u 0 0 *1 

Fig. 10. The ratio of the false alarm probabilities of the (m)-test when the 
observations follow the LFD of the (a)-test (Go) and the LFD of the (m)-test 
(Go). 




t u 0 0 tj 

Fig. 11. The ratio of the miss detection probabilities of the (a)-test when the 
observations follow the LFD of the (m)-test (Go) and the LFD of the (a)-test 
(Go). 


Fig. 13. The ratio of the expected number of samples of the (m)-test when 
the observations follow the LFD of the (a)-test (Gi) and the LFD of the 
(m)-test (Gi). 
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can be constructed by choosing distances different from the 
KL-divergence. It has been shown that the choice of a distance 
plays a crucial role in designing the robust tests. Although the 
robust version of the likelihood ratio test remains the same for 
many distances, there are examples where this assertion is not 
true. Among several distances discussed, the symmetrized \ 2 
has been found to be more suitable for the design of a robust 
hypothesis test if the tail structures of the nominal distributions 
are needed to be roughly preserved. It has been also shown that 
the maximum robustness parameters are bounded from above. 
Both for the (m)-test as well as for the (h)-test, the problem 
of determining the maximum robustness parameters is proven 


Fig. 14. The ratio of the expected number of samples of the (a)-test when 
the observations follow the LFD of the (m)-test (Go) and the LFD of the 
(a)-test (Go). 

to be a convex optimization problem, and therefore the related 
equations can be solved by a polynomial time algorithm. 
Next, the single sample robust tests have been extended to 
fixed sample size tests. Cramer’s theorem has been adopted 
to characterize the asymptotic behavior of the robust tests. 
Interestingly, it has been found that the formulation of the 
asymptotic decrease rate of the error probability for the fixed 
sample size test coincides with the formulation to determine 
the maximum robustness parameters for the (m)-test. Later, 
single sample robust tests have been extended to the sequential 



Fig. 15. The ratio of the expected number of samples of the (a)-test when 
the observations follow the LFD of the (m)-test (Gi) and the LFD of the 
(a)-test (Gi). 
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Fig. 12. The ratio of the expected number of samples of the (m)-test when 
the observations follow the LFD of the (a)-test (Go) and the LFD of the 
(m)-test (Go). 
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hypothesis test. The minimax properties of the considered 
robust tests have either been proven or disproven analytically 
or with simulations. Finally, we have justified that the proposed 
composite model is applicable for robust estimation problems. 
Various simulation results show the agreement with theoretical 
findings. 

TABLE I 

Comparison between the robust tests 



(m)-test 

(a)-test 

(h)-test 

Unique LFDs 

Yes 

Yes 

No (5] 

Unique test 

Yes 

Yes 

Yes 

Limiting test 

Soft sign test 

Like, ratio test 

Sign test 

Suitable for 

Model, errors 

Model, errors 

Outliers 

Non-linear equations 

Two coupled 

Two distinct 

Two distinct 

Number of samples 

n = 1 

n —>■ oo 

1 < n < oo 

Fixed sample size test 

Not robust 

Asymp. rob. |6| 

Robust 

Sequential test, (a, /3) 

Not robust 

Not robust 

Robust 

Sequential test, E[N] 

Not robust 

Asymp. rob. 

Asymp. rob. 
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